Search Result

Select

K-means clustering based on adaptive cuckoo optimization feature selection

Lin SUN, Menghan LIU

Journal of Computer Applications 2024, 44 (3): 831-841. DOI: 10.11772/j.issn.1001-9081.2023030351

Abstract （131）

HTML （7）

PDF （2193KB）（112）

Save

The initial cluster number of the K-means clustering algorithm is randomly determined， a large number of redundant features are contained in the original datasets， which will lead to the decrease of clustering accuracy， and Cuckoo Search （CS） algorithm has the disadvantages of low convergence speed and weak local search. To address these issues， a K-means clustering algorithm combined with Dynamic CS Feature Selection （DCFSK） was proposed. Firstly， an adaptive step size factor was designed during the Levy flight phase to improve the search speed and accuracy of the CS algorithm. Then， to adjust the balance between global search and local search， and accelerate the convergence of the CS algorithm， the discovery probability was dynamically adjusted. An Improved Dynamic CS algorithm （IDCS） was constructed， and then a Dynamic CS-based Feature Selection algorithm （DCFS） was built. Secondly， to improve the calculation accuracy of the traditional Euclidean distance， a weighted Euclidean distance was designed to simultaneously consider the contribution of samples and features to distance calculation. To determine the selection scheme of the optimal number of clusters， the weighted intra-cluster and inter-cluster distances were constructed based on the improved weighted Euclidean distance. Finally， to overcome the defect that the objective function of the traditional K-means clustering only considers the distance within the clusters and does not consider the distance between the clusters， a objective function based on the contour coefficient of median was proposed. Thus， a K-means clustering algorithm based on the adaptive cuckoo optimization feature selection was designed. Experimental results show that， on ten benchmark test functions， IDCS achieves the best metrics. Compared to algorithms such as K-means and DBSCAN （Density-Based Spatial Clustering of Applications with Noise）， DCFSK achieves the best clustering effects on six synthetic datasets and six UCI datasets.

Table and Figures | Reference | Related Articles | Metrics

Select

Feature selection for imbalanced data based on neighborhood tolerance mutual information and whale optimization algorithm

Lin SUN, Jinxu HUANG, Jiucheng XU

Journal of Computer Applications 2023, 43 (6): 1842-1854. DOI: 10.11772/j.issn.1001-9081.2022050691

Abstract （192）

HTML （6）

PDF （1713KB）（208）

Save

Aiming at the problems that most feature selection algorithms do not fully consider class non-uniform distribution of data， the correlation between features and the influence of different parameters on the feature selection results， a feature selection method for imbalanced data based on neighborhood tolerance mutual information and Whale Optimization Algorithm （WOA） was proposed. Firstly， for the binary and multi-class datasets in incomplete neighborhood decision system， two kinds of feature importances of imbalanced data were defined on the basis of the upper and lower boundary regions. Then， to fully reflect the decision-making ability of features and the correlation between features， the neighborhood tolerance mutual information was developed. Finally， by integrating the feature importance of imbalanced data and the neighborhood tolerance mutual information， a Feature Selection for Imbalanced Data based on Neighborhood tolerance mutual information （FSIDN） algorithm was designed， where the optimal parameters of feature selection algorithm were obtained by using WOA， and the nonlinear convergence factor and adaptive inertia weight were introduced to improve WOA and avoid WOA from falling into the local optimum. Experiments were conducted on 8 benchmark functions， the results show that the improved WOA has good optimization performance； and the experimental results of feature selection on 13 binary and 4 multi-class imbalanced datasets show that the proposed algorithm can effectively select the feature subsets with good classification effect compared with the other related algorithms.

Table and Figures | Reference | Related Articles | Metrics

Select

Multilabel feature selection algorithm based on Fisher score and fuzzy neighborhood entropy

Lin SUN, Tianjiao MA, Zhan’ao XUE

Journal of Computer Applications 2023, 43 (12): 3779-3789. DOI: 10.11772/j.issn.1001-9081.2022121841

Abstract （173）

HTML （2）

PDF （1222KB）（91）

Save

For that Fisher score model does not fully consider feature-label and label-label relations， and some neighborhood rough set models easily neglect the uncertainty of knowledge granulations in the boundary region， resulting in the low classification performance of these algorithms， a MultiLabel feature selection algorithm based on Fisher Score and Fuzzy neighborhood entropy （MLFSF） was proposed. Firstly， by using the Maximum Information Coefficient （MIC） to evaluate the feature-label association degree， the relationship matrix between features and labels was constructed， and the correlation between labels was analyzed by the relationship matrix of labels based on the adjusted cosine similarity. Secondly， a second-order strategy was given to obtain multiple second-order label relationship groups to reclassify the multilabel domain， where the strong correlation between labels was enhanced and the weak correlation between labels was weakened to obtain the score of each feature. The Fisher score model was improved to preprocess the multilabel data. Thirdly， the multilabel classification margin was introduced to define the adaptive neighborhood radius and neighborhood class， and the upper and lower approximation sets were constructed. On this basis， the multilabel rough membership degree function was presented， and the multilabel neighborhood rough set was mapped to the fuzzy set. Based on the multilabel fuzzy neighborhood， the upper and lower approximation sets and the multilabel fuzzy neighborhood rough set model were developed. Thus， the fuzzy neighborhood entropy and the multilabel fuzzy neighborhood entropy were defined to effectively measure the uncertainty of the boundary region. Finally， the Multilabel Fisher Score-based feature selection algorithm with second-order Label Correlation （MFSLC） was designed， and then the MLFSF was constructed. The experimental results applied to 11 multilabel datasets with the Multi-Label K-Nearest Neighbor （MLKNN） classifier show that when compared with six state-of-the-art algorithms including the Multilabel Feature Selection algorithm based on improved ReliefF （MFSR）， MLFSF improves the mean of Average Precision （AP） by 2.47 to 6.66 percentage points； meanwhile， MLFSF obtains optimal values for all five evaluation metrics on most datasets.

Table and Figures | Reference | Related Articles | Metrics

Select

Feature selection algorithm based on neighborhood rough set and monarch butterfly optimization

Lin SUN, Jing ZHAO, Jiucheng XU, Xinya WANG

Journal of Computer Applications 2022, 42 (5): 1355-1366. DOI: 10.11772/j.issn.1001-9081.2021030497

Abstract （288）

HTML （9）

PDF （1375KB）（84）

Save

The classical Monarch Butterfly Optimization （MBO） algorithm cannot handle continuous data well， and the rough set model cannot sufficiently process large-scale， high-dimensional and complex data. To address these problems， a new feature selection algorithm based on Neighborhood Rough Set （NRS） and MBO was proposed. Firstly， local disturbance， group division strategy and MBO algorithm were combined， and a transmission mechanism was constructed to form a Binary MBO （BMBO） algorithm. Secondly， the mutation operator was introduced to enhance the exploration ability of this algorithm， and a BMBO based on Mutation operator （BMBOM） algorithm was proposed. Then， a fitness function was developed based on the neighborhood dependence degree in NRS， and the fitness values of the initialized feature subsets were evaluated and sorted. Finally， the BMBOM algorithm was used to search the optimal feature subset through continuous iterations， and a meta-heuristic feature selection algorithm was designed. The optimization performance of the BMBOM algorithm was evaluated on benchmark functions， and the classification performance of the proposed feature selection algorithm was evaluated on UCI datasets. Experimental results show that， the proposed BMBOM algorithm is significantly better than MBO and Particle Swarm Optimization （PSO） algorithms in terms of the optimal value， worst value， average value and standard deviation on five benchmark functions. Compared with the optimized feature selection algorithms based on rough set， the feature selection algorithms combining rough set and optimization algorithms， the feature selection algorithms combining NRS and optimization algorithms， the feature selection algorithms based on binary grey wolf optimization， the proposed feature selection algorithm performs well in the three indicators of classification accuracy， the number of selected features and fitness value on UCI datasets， and can select the optimal feature subset with few features and high classification accuracy.

Table and Figures | Reference | Related Articles | Metrics

Select

Drowsiness recognition algorithm based on human eye state

Lin SUN, Yubo YUAN

Journal of Computer Applications 2021, 41 (11): 3213-3218. DOI: 10.11772/j.issn.1001-9081.2020122058

Abstract （515）

HTML （14）

PDF （1688KB）（357）

Save

Most of the existing drowsiness recognition algorithms are based on machine learning or deep learning， without considering the relationship between the sequence of human eye closed state and drowsiness. In order to solve the problem， a drowsiness recognition algorithm based on human eye state was proposed. Firstly， a human eye segmentation and area calculation model was proposed. Based on 68 feature points of the face， the eye area was segmented according to the extremely large polygon formed by the feature points of human eye， and the total number of eye pixels was used to represent the size of the eye area. Secondly， the area of the human eye in the maximum state was calculated， and the key frame selection algorithm was used to select 4 frames representing the eye opening state the most， and the eye opening threshold was calculated based on the areas of human eye in these 4 frames and in the maximum state. Therefore， the eye closure degree score model was constructed to determine the closed state of the human eye. Finally， according the eye closure degree score sequence of the input video， a drowsiness recognition model was constructed based on continuous multi-frame sequence analysis. The drowsiness state recognition was conducted on the two commonly used international datasets such as Yawning Detection Dataset （YawDD） and NTHU-DDD dataset.Experimental results show that， the recognition accuracy of the proposed algorithm is more than 80% on the two datasets， especially on the YawDD， the proposed algorithm has the recognition accuracy above 94%. The proposed algorithm can be applied to driver status detection during driving， learner status analysis in class and so on.

Table and Figures | Reference | Related Articles | Metrics

Select

CUDA based parallel implementation of simultaneous algebraic reconstruction technique

SHI Huai-lin SUN Feng-rong JIANG Wei LIU Wei QIN Tong LI Xin-cai

Journal of Computer Applications 2011, 31 (05): 1245-1248. DOI: 10.3724/SP.J.1087.2011.01245

Abstract （1517）

PDF （620KB）（1001）

Save

Simultaneous Algebraic Reconstruction Technique (SART) is able to generate Computed Tomography (CT) images with higher quality compared to Filtered Back-Projection (FBP) method when the projection data is incomplete or noisy. However, it is very time-consuming; and parallel computation is one of those efficient approaches to manage the problem. In this study, a new parallel implementation of SART based on the platform of Compute Unified Device Architecture (CUDA) was proposed. The experimental results show that there are no differences between the images reconstructed by this new method and those by serial implementation, but the reconstruction time is greatly decreased, more applicable to clinical application.